14 research outputs found
Alternating least squares as moving subspace correction
In this note we take a new look at the local convergence of alternating
optimization methods for low-rank matrices and tensors. Our abstract
interpretation as sequential optimization on moving subspaces yields insightful
reformulations of some known convergence conditions that focus on the interplay
between the contractivity of classical multiplicative Schwarz methods with
overlapping subspaces and the curvature of low-rank matrix and tensor
manifolds. While the verification of the abstract conditions in concrete
scenarios remains open in most cases, we are able to provide an alternative and
conceptually simple derivation of the asymptotic convergence rate of the
two-sided block power method of numerical algebra for computing the dominant
singular subspaces of a rectangular matrix. This method is equivalent to an
alternating least squares method applied to a distance function. The
theoretical results are illustrated and validated by numerical experiments.Comment: 20 pages, 4 figure
Local convergence of alternating low‐rank optimization methods with overrelaxation
The local convergence of alternating optimization methods with overrelaxation for low-rank matrix and tensor problems is established. The analysis is based on the linearization of the method which takes the form of an SOR iteration for a positive semidefinite Hessian and can be studied in the corresponding quotient geometry of equivalent low-rank representations. In the matrix case, the optimal relaxation parameter for accelerating the local convergence can be determined from the convergence rate of the standard method. This result relies on a version of Young's SOR theorem for positive semidefinite 2x2 block systems
Towards Practical Control of Singular Values of Convolutional Layers
In general, convolutional neural networks (CNNs) are easy to train, but their
essential properties, such as generalization error and adversarial robustness,
are hard to control. Recent research demonstrated that singular values of
convolutional layers significantly affect such elusive properties and offered
several methods for controlling them. Nevertheless, these methods present an
intractable computational challenge or resort to coarse approximations. In this
paper, we offer a principled approach to alleviating constraints of the prior
art at the expense of an insignificant reduction in layer expressivity. Our
method is based on the tensor-train decomposition; it retains control over the
actual singular values of convolutional mappings while providing structurally
sparse and hardware-friendly representation. We demonstrate the improved
properties of modern CNNs with our method and analyze its impact on the model
performance, calibration, and adversarial robustness. The source code is
available at: https://github.com/WhiteTeaDragon/practical_svd_convComment: Published as a conference paper at NeurIPS 202